The Senate Legislative Activities Collection (SLA): a Case Study Infrastructure Research to Support Preservation Strategies
نویسندگان
چکیده
Knowledge-based persistent archives manage digital objects, information attributes about the digital objects, and the implied knowledge that is needed to understand relationships between the information attributes. This report illustrates the processes used to ingest data, information, and knowledge in the context of the Senate Legislative Activities collection. 1. Preservation Strategies Persistent archives provide not only long-term management of digital objects, but also the mechanisms to discover and access a desired data set. Discovery requires a context for identifying the desired data set. Collections can be used to organize the context as a set of attributes associated with each digital object. Discovery is done as a query against the collection. This approach then requires the ability to manage persistent collections, as well as persistent storage of data[1]. Collections of digital objects contain implied knowledge. The knowledge can be thought of as relationships that are inherent between the collection attributes, or the processes that were used to create the digital objects, or even the processes that were used to assemble the collection. The implied knowledge can be expressed as relationships between concepts. The capabilities of a persistent archive must then include long-term storage of digital objects, collections, and concept spaces[2]. In this paper, we examine the ingestion processes used to identify information and knowledge. In particular, we demonstrate an ingestion process that can be used to characterize both the implied knowledge and the associated information attributes for a collection of digital objects. The ingestion process requires the identification of the concepts that will be associated with the collection, the specification of attributes that are associated with each concept, an analysis of the closure of the attributes relative to the knowledge concept space, and an analysis of the completeness of coverage of the information content by the chosen attributes. The knowledge and information ingestion demonstration is done within the context of the Senate Legislative Activities collection (SLA). The Library of Congress maintains the Thomas collection which keeps track of bills, resolutions, and amendments (BARs for short!) sponsored by each senator. The particular form of the collection that we ingested is an extract of the 106 Congress database. The SLA is provided on a CD-ROM and comprises 100 files in RTF (Rich Text Format) format. In reality, while we expected 100 files, 1 per senator, only 99 were delivered on the CD-ROM. The RTF file for Phil Gramm, Senator from Texas, was not included in our copy. Otherwise the collection contains one file per senator. Each file contains BARs grouped by sponsorship, cosponsorship, and referral committee, as well as a master subject index. Each file reflects a particular senator’s legislative contribution over the course of the 106 Congress. For more details on the physical structure of SLA, please see Appendix A. The process we used to ingest the collection is described in detail in reference [2]. In summary, we perform the following steps: • Create a knowledge space that defines the concepts that will be used to organize the collection. For the SLA, the concepts can be based on the procedural stages through which a bill passes. • Define the information attributes needed to describe each concept in the knowledge space. • Tag the attributes using the eXtensible Markup Language[3]. This makes it much easier to automate the remaining analysis steps. • Determine the closure of the knowledge description. Are all implied knowledge relationships within the collection defined? Are there any remaining implicit relationships that require knowledge about the process for creating a bill to be able to correctly interpret the collection? • Determine the completeness of the attribute selection. Are any of the attributes compound, in the sense that the attribute contains multiple types of data? Are all of the attributes populated? Are there anomalous values present in any attribute? Do attributes values appear more times than expected within a given section? • Evaluate the presence of artifacts or anomalies within the collection. Every deviation of attribute values from the expected range or expected number of occurrences, represents either implied knowledge that has not been characterized, or an anomaly in the collection. The explicit characterization of the implied knowledge within a collection makes it possible to identify and tag all anomalies. In essence, by tracking knowledge closure and information completeness, it is possible to identify all collection artifacts, and produce a version of the collection that represents the creator’s original intent. 1. KNOWLEDGE GENERATION Every collection contains implied knowledge. This can be as simple as the rule for creating a unique senator name (last name,( state,( first name))), or the meaning of unique terminology used within the collection. A persistent archive must provide an explicit representation for the implicit knowledge concepts, to ensure that future generations will be able to analyze the digital objects. Accessioning Template, AT To analyze which are the necessary and sufficient elements of an SLA electronic record, we create an initial accessioning template, AT, named after the notion of “Template for Analysis” [4]. Whereas the Template for Analysis identifies and defines all the possible elements that a record may contain (including medium, extrinsic elements, intrinsic elements, annotations, and context), the AT primarily identifies and defines the “Documentary Form”, meaning an abstraction of the internal structure of the electronic records. An AT is a working hypothesis or model about the necessary and sufficient data elements of a record. As such, it is a conceptual model of the collection and can be expressed in any suitable formalism: IDEF0 model, topic map, RDF, ER-diagram, UML, etc. Using the AT to Reach “Closure” The AT is used to select attributes that serve to extract content from the input collection based on pattern matching ingestion rules. These attributes specify which elements are extracted or tagged during the ingestion. We call this phase: Attribute Tagging. Attribute Tagging is based on a simple event-driven pattern-matching process: repeat if match found for attribute then perform attribute tagging until done
منابع مشابه
Legislative Effectiveness in the United States Senate
Drawing on data from 1973-2014 (93rd-113th Congresses), we develop a new method for measuring the legislative effectiveness of members of the United States Senate that builds upon Volden and Wiseman’s analysis of legislative effectiveness in the U.S. House. We compare the construction and analysis of our Senate Legislative Effectiveness Scores (LES) to those of the U. S. House over the same 40-...
متن کاملشناسایی راهبردهای توسعه کارآفرینی در بخش کشاورزی از نظر متخصصان کشاورزی مستقر در پارک علم و فناوری کرمانشاه
The present study aims to identify the entrepreneurship development strategies in agricultural sector based on agricultural expert’s views of Science and Technology Park of Kermanshah. The research method study applied in terms of purpose and descriptive-survey in terms of data collection. The population consisted of entire managers of agricultural sectors wich co...
متن کاملLegislative Update.
length: 105 days. If necessary, the Governor can call for a 30-day special session. Legislators can call themselves into special session with a two-thirds vote. Next Cutoff: April 3, Last day to read in committee reports from opposite house exept House fiscal committees and Senate Ways & Means and Transportation committees. Information: • For up-to-date legislative information, visit: www.leg.w...
متن کاملStrategies for the Development of Mobile Learning through Teaching - Learning Activities in Medical Education: Perspectives of Medical Students and IT Professionals in Isfahan University of Medical Sciences
Introduction: Mobile learning, a new stage in development of e-learning, seeks to provide opportunities for information transfer, strengthen, and improve lifelong learning in medical students. The aim of this study was identifying strategies to develop mobile learning, according to the perspectives of students of Isfahan University of Medical Sciences and IT professionals. Methods: This was a ...
متن کاملStudying the Viewpoints of Librarians and Users about the Challenges and Strategies for User Participation in Library Activities and Services: A Case Study of Public Libraries in Bandar Abbas
Purpose: The purpose of this research is to identify the viewpoints of Bandar Abbas public libraries librarians and users about the challenges and strategies of users’ participation in public library activities and services. Method: The methodology used in this research is qualitative. The potential partners of this study include 20 librarians in five Bandar Abbas public libraries and users of...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2001